Multi-Object Tracking

Multi-Object Tracking

Decode-MOT: How Can We Hurdle Frames to Go Beyond Tracking-by-Detection?

  • Decode-MOT Decision Coordinator: A novel module that adaptively chooses between tracking-by-detection (TBD) and tracking-by-motion (TBM) at each frame, boosting speed without much accuracy loss.
  • Contextual Learning Framework:
    • • Scene Context Learning via attention-based comparison of convolutional features across frames.
    • • Tracking Context Learning based on motion and object count (cardinality) similarity.
  • Self-Supervised Learning Approach: A strategy to train the decision coordinator without ground truth, using pseudo labels derived from contextual similarities between TBD and TBM results.
  • Hierarchical Confidence Association: A multi-stage track-detection association strategy that leverages track/detection confidence to reduce association ambiguity progressively.

YDLD Dataset Examples

Fig 1. Accuracy and speed of the recent methods on the MOTChallenge dataset.

Proposed Model Architecture:

Model Architecture

Fig 2. The overall architecture of our Decode-MOT. It consists of (a) a decision coordinator of predicting the probability of TBM, (b) a scene context representation module of evaluating
the long-term attention between different frames, and (c) a hierarchical association of linking between detections and tracks progressively

Proposed Contextual Learning:

Model Architecture

Performance Results

Model Architecture

COMPARISON AMONG OUR DECODE-MOT, THE BASELINE WITH THE HIERARCHICAL ASSOCIATION, AND THE BASELINE TRACKER WITH DIFFERENT TDRS ON MOT15 DATASET.
THE PERCENTAGE IN [·] SHOWS THE SPEED GAIN AND ACCURACY REDUCTION RATES OF EACH TRACKER AS TDR DECREASES

Comparison with SOTA Methods

Model Architecture